Building a Large Knowledge Base from a Structured Source: The CIA World Fact Book
نویسندگان
چکیده
The on-line world is populated by an increasing number of knowledge-rich resources. Furthermore, there is a growing trend among authors to provide semantic markup of these resources. This presents a tantalizing prospect. Perhaps we can leverage the person-years of effort invested in building these knowledge-rich resources to create large-scale knowledge bases. The World Fact Book knowledge base has been an experiment in the construction of a large-scale knowledge base from a source authored using semantic markup. The content of the knowledge base is, in large part, derived from the CIA World Fact Book, and covers a broad range of information about the world’s nations. The World Fact Book is a highly structured document with a complex underlying ontology. The structure makes it possible to parse the document in order to carry out the knowledge extraction. However, irregularities of the text written by humans and the complexity of the domain make the knowledge extraction process non-trivial. We describe the process we used to construct the World Fact Book knowledge base, including parsing the source, refining the implicit knowledge, constructing a substantial supporting ontology, and reusing existing ontologies. We also discuss some of the key representational issues addressed and show how the resulting axioms can be used to answer a variety of queries. We hope that the broad accessibility of the resulting knowledge base and its neutral representational format will enable others to work with and extend the content, as well as explore issues of structuring and inferencing in large-scale knowledge bases.
منابع مشابه
Creative Puzzle Generation from Factual Content
Comprehensive knowledge-bases can be seen as not only rich sources of factual content – that is, answers – but also as rich sources of questions. In this paper we explore the potential of knowledge resources like the CIA World Fact book to serve as the generative basis of a series of creative educational puzzles.
متن کاملA study of the principles and categories related to the world of architecture in the book of Nuzhat Nama-yi Ala'i
Ancient Persian sources are among the most important sources for understanding the past architecture of Iran. Among these texts is the book Nuzhat Nama-yi Alachr('39')i by Shah Mardan Ibn Abi al-Khayr, which was written in the last years of the fifth century AH. This book is an encyclopedia of common sciences of that time and includes various subjects such as animals, plants, jewelry, arithmeti...
متن کاملRefining Ontologies via Pattern-based Clustering
In this paper we consider the problem of finding subconcepts of a known concept (reference concept) in a given ontology in the light of new knowledge coming from a data source. These subconcepts are discovered by looking for frequent association patterns between the reference concept and other concepts also occurring in the existing ontology. As an illustration, we report preliminary results ob...
متن کاملNIE: An Approach for Extracting Information from Narrative Web Information Sources
The World Wide Web (WWW) has become an indispensable repository of valuable information on a wide range of different subjects. However many web information sources (WISs) present their information in a semi-structured format, which made it uneasy to directly extract and manipulate the information. Consequently, there have been many attempts to develop some approaches that can automatically gene...
متن کاملQuery Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999